Reactive Rebalancing for Scientific Simulations running on ExaScale High Performance Computers

نویسندگان

Roel Wuyts

Karl Meerbergen

Pascal Costanza

چکیده

Exascale computers, the next generation of high performance computers, are expected to process 1 exaflops around 2018. However the processor cores used in these systems are very likely to suffer from unpredictable high variability in performance. We built a prototype generalpurpose reactive work rebalancer that handles such performance variability with low overhead. We did an experimental validation by developing a reactive rebalancer library in UPC, and using it in a 5-point stencil (heat) simulation. The experiments show that our approach has very limited overhead that compensates for runtime processor speed variations, with or without simulated processor slowdowns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring reliability of exascale systems through simulations

Exascale computers are predicted to emerge by the end of this decade with millions of nodes and billions of concurrent cores/threads. One of the most critical challenges for exascale computing is how to effectively and efficiently maintain the system reliability. Checkpointing is the state-of-theart technique for high-end computing system reliability that has proved to work well for current pet...

متن کامل

LNCS 7851 - High Performance Computing for Computational Science - VECPAR 2012

The development of an exascale computing capability with machines capable of executing O(10) operations per second by the end of the decade will be characterized by significant and dramatic changes in computing hardware architecture from current (2012) petascale high-performance computers. From the perspective of computational science, this will be at least as disruptive as the transition from ...

متن کامل

Modeling and Simulation of Dynamic Applications for Exascale Computing Platforms

• Basics of experiment analysis with R is a plus 1 Context There is a continued need for higher compute performance: scientific grand challenges, engineering, geo-physics, bioinformatics, etc. Such studies used to be carried out on large ad hoc supercomputers, which, for economical reasons, were replaced by commodity clusters, i.e., sets of off-the-shelf computers interconnected by fast switche...

متن کامل

Exploring Energy Behaviors of I/O Management Approaches for Exascale Systems

The advent of fast, unprecedentedly scalable, yet energy-hungry exascale supercomputers poses a major challenge consisting in sustaining a high performance per watt ratio. While much recent work has explored new approaches to I/O management, aiming to reduce the I/O performance bottleneck exhibited by HPC applications (and hence to improve application performance), there is comparatively little...

متن کامل

Scalable and Highly Available Fault Resilient Programming Middleware for Exascale Computing

A hierarchical master-worker model is believed to be a promising programming paradigm that can achieve weak scaling on exascale-level high performance computers [1]. However, “fault resiliency” is one of the most important issues for exascale computing because the Mean Time Between Failure (MTBF) of such computers will be short [2]. We propose a fault resilient programming middleware called Fal...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Reactive Rebalancing for Scientific Simulations running on ExaScale High Performance Computers

نویسندگان

چکیده

منابع مشابه

Exploring reliability of exascale systems through simulations

LNCS 7851 - High Performance Computing for Computational Science - VECPAR 2012

Modeling and Simulation of Dynamic Applications for Exascale Computing Platforms

Exploring Energy Behaviors of I/O Management Approaches for Exascale Systems

Scalable and Highly Available Fault Resilient Programming Middleware for Exascale Computing

عنوان ژورنال:

اشتراک گذاری